[Performance] Add --enable-ep-weight-filter CLI option by esmeetu · Pull Request #37351 · vllm-project/vllm

esmeetu · 2026-03-17T22:39:22Z

Summary

Add --enable-ep-weight-filter opt-in CLI flag to skip non-local expert weights during model loading when EP is active
Makes the EP weight filter from [Performance][Model Loader] Skip non-local expert weights during EP model loading #37136 an explicit opt-in feature
No behavior change without the flag
Regression mentioned in [Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy #37322

Usage

vllm serve model \
  --enable-expert-parallel \
  --enable-ep-weight-filter

Without --enable-ep-weight-filter, loading behavior is identical to main.

Test plan

vllm serve without --enable-ep-weight-filter — no behavior change
vllm serve --enable-expert-parallel --enable-ep-weight-filter on per-expert MoE — correct loading, reduced I/O
Non-MoE model with flag — no effect
3D fused-expert model with flag — no effect (filter returns None)

🤖 Generated with Claude Code

Add opt-in flag to skip non-local expert weights during model loading when expert parallelism is active. Each rank only reads its own expert shard from disk, reducing storage I/O for MoE models with per-expert weight tensors. Signed-off-by: esmeetu <esmeetu@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: esmeetu <jasonailu87@gmail.com>

gemini-code-assist

Code Review

This pull request introduces an opt-in command-line flag --enable-ep-weight-filter to optimize model loading for Mixture-of-Experts models with expert parallelism. The changes correctly add the new configuration option and integrate it into the model loading logic. My main feedback is to add a validation check to ensure enable_expert_parallel is active when enable_ep_weight_filter is used, to prevent silent failures from misconfiguration and improve user experience.

gemini-code-assist · 2026-03-17T22:47:57Z

    """Whether the deployed model is MoE (if known)."""
    enable_expert_parallel: bool = False
    """Use expert parallelism instead of tensor parallelism for MoE layers."""
+    enable_ep_weight_filter: bool = False


To improve robustness and prevent user confusion from misconfiguration, it's a good practice to validate that enable_expert_parallel is enabled when enable_ep_weight_filter is used. Currently, if a user enables enable_ep_weight_filter without enable_expert_parallel, it will fail silently.

Consider adding a validation check in the _validate_parallel_config method of this class, similar to how enable_eplb is validated. This would raise an error for invalid combinations.

Example:

if self.enable_ep_weight_filter and not self.enable_expert_parallel: raise ValueError( "enable_expert_parallel must be True to use enable_ep_weight_filter." )

Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit 761e0aa)

…37351) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…37351) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit 761e0aa)

…37351) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…37351) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit 5eb1ef3)

…37351) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…37351) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit 4011dab)

esmeetu requested review from 22quinn, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners March 17, 2026 22:39

esmeetu mentioned this pull request Mar 17, 2026

[Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy #37322

Merged

3 tasks

tlrmchlsmth approved these changes Mar 17, 2026

View reviewed changes

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

esmeetu added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2026

khluu added this to the v0.18.0 cherry picks milestone Mar 18, 2026

esmeetu merged commit 761e0aa into main Mar 18, 2026
69 checks passed

esmeetu deleted the opt-ep-weights-filter branch March 18, 2026 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Add --enable-ep-weight-filter CLI option#37351

[Performance] Add --enable-ep-weight-filter CLI option#37351
esmeetu merged 1 commit into
mainfrom
opt-ep-weights-filter

esmeetu commented Mar 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

esmeetu commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

esmeetu commented Mar 17, 2026 •

edited

Loading